Part 3: ANOVA and General Linear Model
Psychology and Child Development
Regression is only one member of the family of General Linear Model (GLM).
The GLM covers models such as Analysis of Variance (ANOVA), Multivariate Analysis of Variance (MANOVA), Classical Regression Model (linear regression) and \(t\)-test.
My aim was to introduce these topics as a generalization of GLM, or better said, as a family.
There is a large theory related to these GLM models, but we don’t have time to discuss every detail. More advanced classes are necessary to explain particular details.
graph TD
A[GLM] --> B(ANOVA)
A[GLM] --> C(MANOVA)
A[GLM] --> D(t-test)
A[GLM] --> E(Linear Regression)
Analysis of Variance a.k.a ANOVA is a model that helps to analyze mean differences in multiple groups. It is actually very similar to the Classical Regression Model, that’s why I included ANOVA in the Classical Regression Model lecture.
The ANOVA model tests the null hypothesis that all group means are equal. It could be represented like this:
\[\begin{equation} H_{0}: \mu_{1} = \mu_{2} = \mu_{3} \end{equation}\]
This would be what we call an omnibus test, which means we are testing the overall effect of our independent variable on the dependent variable.
As always, let’s understand this with an example:
| maritl | M | SD | variance |
|---|---|---|---|
| 1. Never Married | 92.73 | 32.92 | 1083.73 |
| 2. Married | 118.86 | 43.12 | 1859.38 |
| 3. Widowed | 99.54 | 23.74 | 563.64 |
| 4. Divorced | 103.16 | 33.80 | 1142.51 |
| 5. Separated | 101.22 | 33.66 | 1133.22 |
Are the mean differences in wage by marital status explainable by chance alone?
In this case ANOVA could tell us: Yes! there is a difference, but where?
In this scenario, ANOVA will allow us to do pairwise comparisons , this means; we could compare the mean of divorced males versus the mean of married males, but also test all the combinations later on. This is called a post hoc analysis.
\[\begin{equation} outcome_{i} = (model) + error_{i} \end{equation}\]
This means, that ANOVA accounts for the variances within your groups, and also calculates the variance that is explained by your MODEL. In fact, the Classical Regression Model does exactly the same thing.
Remember: The line of best fit in regression is estimated adding the sum of square distances from the line. The line with the lowest sum of distances is the best line in regression.
We are going to do something similar in ANOVA, it is called the Total Sum of Squares (\(SS_{T}\)), this calculation will give you the distance from the grand mean:
\[\begin{equation} SS_{T} = \sum^N_{i=1}(x_{i}-\bar{x}_{(grandMean)}) \end{equation}\]
We can continue with our example with wage \(~\) maritl.
We can calculate the grand mean of wage:
R:[1] 5222086
The total \(SS\) is 5222086. This is the TOTAL variation within the data.
\[\begin{equation} SS_{M} = \sum^k_{n=1}n_{k}(\bar{x}_{k}-\bar{x}_{grandMean})^2 \end{equation}\]
\(k\) represents each group, in simple words we are estimating the mean difference of each group from the overall mean. That’s the distance from the grand mean.
We will need to estimate the residuals of the our model, the residuals is variance not explained by our ANOVA model. The estimation is:
Df Sum Sq Mean Sq F value Pr(>F)
maritl 4 363144 90786 55.96 <2e-16 ***
Residuals 2995 4858941 1622
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(data = Wage,aes(x=maritl, y=wage, fill=maritl)) +
geom_boxplot() +
stat_summary(fun="mean", color="red")+
scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
theme_classic()+
theme(legend.position = "none")+
labs(x= "Marital Status", y= "Wage", title = "Boxplot of Wage by Marital Status")pairTest <-TukeyHSD(modelAnova)
as.data.frame(lapply(pairTest, function(x) round(x,2))) %>%
rename(meanDiff = maritl.diff,
lower = maritl.lwr ,
upper = maritl.upr ,
pvalueAdj = maritl.p.adj
) %>%
kbl(caption = "Pairwise Contrast") %>%
kable_classic_2("hover", full_width = T,
bootstrap_options = c("striped", "hover", "condensed", "responsive"))| meanDiff | lower | upper | pvalueAdj | |
|---|---|---|---|---|
| 2. Married-1. Never Married | 26.13 | 21.18 | 31.07 | 0.00 |
| 3. Widowed-1. Never Married | 6.80 | -18.78 | 32.39 | 0.95 |
| 4. Divorced-1. Never Married | 10.42 | 1.60 | 19.25 | 0.01 |
| 5. Separated-1. Never Married | 8.48 | -6.96 | 23.92 | 0.56 |
| 3. Widowed-2. Married | -19.32 | -44.66 | 6.02 | 0.23 |
| 4. Divorced-2. Married | -15.70 | -23.77 | -7.63 | 0.00 |
| 5. Separated-2. Married | -17.64 | -32.66 | -2.63 | 0.01 |
| 4. Divorced-3. Widowed | 3.62 | -22.75 | 29.99 | 1.00 |
| 5. Separated-3. Widowed | 1.68 | -27.58 | 30.93 | 1.00 |
| 5. Separated-4. Divorced | -1.94 | -18.65 | 14.76 | 1.00 |